고품질 빅데이터 분석을 위한 최적의 전처리 순열 추천 방법

김성현; 서영균; 탁병철; Seounghyun Kim; Young-Kyoon Suh; Byungchul Tak

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	고품질 빅데이터 분석을 위한 최적의 전처리 순열 추천 방법
영문제목(English Title)	A Recommendation Scheme for an Optimal Pre-processing Permutation Towards High-Quality Big Data Analytics
저자(Author)	김성현 서영균 탁병철 Seounghyun Kim Young-Kyoon Suh Byungchul Tak
원문수록처(Citation)	VOL 47 NO. 03 PP. 0319 ~ 0327 (2020. 03)
한글내용 (Korean Abstract)	오늘날 폭발적인 데이터의 증가로, 다양한 분야에서 빅데이터 분석을 통한 지능 서비스 연구가 활발히 진행 중이다. 데이터 마이닝 또는 기계 학습을 통한 빅데이터 분석은 학습 데이터에 대한 전처리가 필수적이다. 주어진 데이터에 대한 불완전하고 부적절한 전처리는 신뢰하기 힘든 분석 결과를 낳을 수 있음에도 불구하고, 사용자가 최상의 결과를 도출할 수 있는 전처리 함수들에 대한 최적의 집합 및 그 순서를 선택하는 것은 어렵다. 이러한 문제를 역설하기 위해, 본 논문에서는 사용자가 제공한 데이터에 최적화된 전처리 함수들의 순열을 분석하고 추천하는 플랫폼을 설계하고 구현하였다. 제안된 추천 방법을 실세계 데이터를 사용하여 평가한 결과는 최적의 전처리 순열은 최악의 전처리 순열과 비교하여 정확도 측면에서 가장 뛰어난 성능을 보이고 있음을 입증한다. 사용자는 본 논문이 제안하는 방법을 적용하여 최상의 전처리 순열을 선택할 수 있어 고품질 빅데이터 분석 결과를 얻을 수 있을 것으로 기대된다.
영문내용 (English Abstract)	Today, due to the explosive increase in data, intelligent service research through big data analysis has been actively conducted in various domains. Pre-processing of training data is essential to big data analytics via data mining or machine learning. Although incomplete and inadequate pre-processing for a given dataset can result in unreliable analysis, it is challenging for users to choose the optimal set and sequence of pre-processing functions that leads to the best results. To address this problem, we have designed and implemented a pre-processing evaluation platform that can analyze the performance of a various permutation of pre-processing functions for a given user dataset and then recommend the best permutation. Evaluation results using the real-world dataset demonstrates that the recommended pre-processing permutation yields the best performance in terms of accuracy when compared to the worst pre-processing permutation. By applying the method proposed in this paper, users can choose the best preprocessing permutation, thus being expected to obtain high-quality big data analysis results
키워드(Keyword)	전처리 순열 전처리 최적화 빅데이터 전처리 pre-processing permutation 전처리 추천 pre-processing optimization big data pre-processing pre-processing recommendation
파일첨부	PDF 다운로드